Object matching on real-world problems
نویسنده
چکیده
Object matching (also referred to as duplicate identification, record linkage, entity resolution or reference reconciliation) is a crucial task for data integration and data cleaning. The task is to detect multiple representations of the same real-world object. This is a challenging task particularly for objects that are highly heterogeneous and of limited data quality, e.g., regarding completeness and consistency of their descriptions. To gain a better overview about the current state of the art in object matching, we survey the existing frameworks and their evaluations. According to the defined criteria, we review various frameworks published in the literature. We characterize them in some detail and compare them with each other and with our own framework, FEVER. With FEVER we introduce a new generic and comprehensive framework for object matching and comparative object matching evaluation. FEVER offers numerous operators for constructing non-learning as well as learning-based match workflows. Moreover, FEVER allows match approaches to be automatically executed and evaluated under different parameter configurations. Therefore FEVER sets the platform for conducting a comparative evaluation on the relative effectiveness and efficiency of alternate match approaches. Despite the huge amount of recent research efforts on object matching there has not yet been such an evaluation. With FEVER we fill this gap and present an evaluation of existing implementations on challenging real-world match tasks. We use the FEVER framework to automatically execute the approaches and to find favourable parameter settings in a comparable way. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also consider a state-of-the-art commercial object matching implementation. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging matching tasks such as matching product offers from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values. Furthermore, this thesis addresses the product offer matching problem. Product of-
منابع مشابه
Categorization via Agglomerative Correspondence Clustering
This paper presents computationally efficient object detection, matching and categorization via Agglomerative Correspondence Clustering (ACC). We implement ACC for feature correspondence and object-based image matching exploiting both photometric similarity and geometric consistency from local invariant features. Objectbased image matching is formulated here as an unsupervised multi-class clust...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملObject Search Using Orientation Code Matching
A new method for object search is proposed. The proposed scheme is based on matching gradient information around each pixel, computed in the form of orientation codes, rather than the gray levels directly and is robust against irregularities occurring in the real world scenes. A probabilistic model for robust matching is given and verified by real image data. Experimental results for real world...
متن کاملFamiliarity-matching in decision making: Experimental studies on cognitive processes and analyses of its ecological rationality
Previous studies have shown that individuals often make inferences based on heuristics using recognition, fluency, or familiarity. In the present study, we propose a new heuristic called familiarity-matching, which predicts that when a decision maker is familiar (or unfamiliar) with an object in a question sentence, s/he will choose the more (or less) familiar object from the two alternatives. ...
متن کاملStatistical/Geometric Techniques for Object Representation and Recognition
Title of dissertation Statistical/Geometric Techniques for Object Representation and Recognition Soma Biswas, Doctor of Philosophy, 2009 Directed by Professor Rama Chellappa Department of Electrical and Computer Engineering Object modeling and recognition are key areas of research in computer vision and graphics with wide range of applications. Though research in these areas is not new, traditi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014